04:00
2026-06-15
arxiv.org
ai-agents
Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents
Researchers introduced Dialogue SWE-Bench, a benchmark for evaluating AI coding agents through dialogue with users, revealing that coding proficiency does not guarantee strong dialogue skills. The stuβ¦